Features Extraction Algorithm from Sgml for Classification
نویسندگان
چکیده
The basic phases in text categorization include preprocessing features, extracting relevant features against the features in a database, and finally categorizing a set of documents into predefined categories. Most of the researches in text categorization are focusing more on the development of algorithms and computer techniques. An algorithm for pre-processing features is seem to be like a "black-box" and ignored by them. Thus, it is significant and worthwhile to develop an algorithm for preprocessing features and finally can be used by other beginners before going in depth in the field of text categorization. This research proposes an algorithm for preprocessing features with capability of Microsoft .NET framework technology. The actual implementation shows that, this algorithm can extract interested features from the standard corpus of collection and upload into a relational database. Keyword: Preprocessing, text categorization, algorithm, .net
منابع مشابه
Optimal Feature Extraction for Discriminating Raman Spectra of Different Skin Samples using Statistical Methods and Genetic Algorithm
Introduction: Raman spectroscopy, that is a spectroscopic technique based on inelastic scattering of monochromatic light, can provide valuable information about molecular vibrations, so using this technique we can study molecular changes in a sample. Material and Methods: In this research, 153 Raman spectra obtained from normal and dried skin samples. Baseline and electrical noise were eliminat...
متن کاملHeart Rate Variability Classification using Support Vector Machine and Genetic Algorithm
Background: Electrocardiogram (ECG) is defined as an electrical signal, which represents cardiac activity. Heart rate variability (HRV) as the variation of interval between two consecutive heartbeats represents the balance between the sympathetic and parasympathetic branches of the autonomic nervous system.Objective: In this study, we aimed to evaluate the efficiency of discrete wavelet transfo...
متن کاملClassification of ECG signals using Hermite functions and MLP neural networks
Classification of heart arrhythmia is an important step in developing devices for monitoring the health of individuals. This paper proposes a three module system for classification of electrocardiogram (ECG) beats. These modules are: denoising module, feature extraction module and a classification module. In the first module the stationary wavelet transform (SWF) is used for noise reduction of ...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملObject-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images
As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...
متن کاملClassification of polarimetric radar images based on SVM and BGSA
Classification of land cover is one of the most important applications of radar polarimetry images. The purpose of image classification is to classify image pixels into different classes based on vector properties of the extractor. Radar imaging systems provide useful information about ground cover by using a wide range of electromagnetic waves to image the Earthchr('39')s surface. The purpose ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007